136 research outputs found

    Protein beta-turn assignments

    Get PDF
    A classical way to analyze protein 3D structures or models is to investigate their secondary structures. Their predictions are also widely used as a help to build new 3D models. Thus, hundreds of prediction methods have been proposed. Nonetheless before predicting, secondary structure assignment is required even if not trivial. Therefore numerous but diverging assignment methods have been developed. ÎČ-turns constitute the third most important secondary structures. However, no analysis to compare the ÎČ-turn distributions according to different secondary structure assignment methods has ever been done. We propose in this paper to analyze and evaluate the results of such a comparison. We highlight some important divergence that could have important consequence for the analysis and prediction of ÎČ-turns

    An agnostic analysis of the human AlphaFold2 proteome using local protein conformations

    Get PDF
    For more than 30 years, different computational approaches have been implemented to propose 3D structural models of proteins from their amino acid sequence. Using deep Learning, AlphaFold 2 obtained particularly remarkable results; some models were within the uncertainties of the experimental resolution (Jumper et al., Nature 2021). AlphaFold 2 code is freely avalaible and EBI provides structural model databases (Tunyasuvunakool et al., Nature 2021), i.e. 98.5% of the human proteome is given. 36% of these models are predicted with atomistic quality. The human protein models provided by AlphaFold were analyzed using its confidence index (pLDDT score), with classic secondary structure and finer analysis of local protein conformation, e.g. Îł-turns, ÎČ-turns and bends, ÎČ-turn types, PolyProline II (PPII), helix curvatures, ÎČ-bulges, and a structural alphabet, namely Protein Blocks (PB). As expected, the large majority of α-helices are well predicted with high pLDDT scores. However, some points are intriguing and could potentially lead to improvements in the future: (i) PPII helices are too often encountered with a low confidence index. They represent 4-5% of all residues and are important in protein-protein interactions; it could so be an issue to be poorly approximated. (ii) In a very surprising way, while ÎČ-turns (turns of 4 residues) are well predicted, 55% of Îł-turns (3 residues) have very low pLDDT scores. (iii) Even more strikingly, 94.8% of cis ω angles associated with low pLDDT scores, i.e. AlphaFold is clearly unable to propose proper cis ω angles. (iv) ÎČ-sheet occurrence is lower than expected, while PB d (i.e. ÎČ-sheet core geometry) occurrence is completely in accordance with the expected frequencies. There are so potentially ÎČ-sheets that were not founded until the end, which would explain this low frequency (de Brevern, Biochimie 2023). AlphaFold 2 had impacted the structural modeling area but works remained (Tourlet et al., BioMedInformatics 2023)Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202

    Use of a structural alphabet for analysis of short loops connecting repetitive structures

    Get PDF
    BACKGROUND: Because loops connect regular secondary structures, analysis of the former depends directly on the definition of the latter. The numerous assignment methods, however, can offer different definitions. In a previous study, we defined a structural alphabet composed of 16 average protein fragments, which we called Protein Blocks (PBs). They allow an accurate description of every region of 3D protein backbones and have been used in local structure prediction. In the present study, we use this structural alphabet to analyze and predict the loops connecting two repetitive structures. RESULTS: We first analyzed the secondary structure assignments. Use of five different assignment methods (DSSP, DEFINE, PCURVE, STRIDE and PSEA) showed the absence of consensus: 20% of the residues were assigned to different states. The discrepancies were particularly important at the extremities of the repetitive structures. We used PBs to describe and predict the short loops because they can help analyze and in part explain these discrepancies. An analysis of the PB distribution in these regions showed some specificities in the sequence-structure relationship. Of the amino acid over- or under-representations observed in the short loop databank, 20% did not appear in the entire databank. Finally, predicting 3D structure in terms of PBs with a Bayesian approach yielded an accuracy rate of 36.0% for all loops and 41.2% for the short loops. Specific learning in the short loops increased the latter by 1%. CONCLUSION: This work highlights the difficulties of assigning repetitive structures and the advantages of using more precise descriptions, that is, PBs. We observed some new amino acid distributions in the short loops and used this information to enhance local prediction. Instead of describing entire loops, our approach predicts each position in the loops locally. It can thus be used to propose many different structures for the loops and to probe and sample their flexibility. It can be a useful tool in ab initio loop prediction

    Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray technologies produced large amount of data. In a previous study, we have shown the interest of <it>k-Nearest Neighbour </it>approach for restoring the missing gene expression values, and its positive impact of the gene clustering by hierarchical algorithm. Since, numerous replacement methods have been proposed to impute missing values (MVs) for microarray data. In this study, we have evaluated twelve different usable methods, and their influence on the quality of gene clustering. Interestingly we have used several datasets, both kinetic and non kinetic experiments from yeast and human.</p> <p>Results</p> <p>We underline the excellent efficiency of approaches proposed and implemented by Bo and co-workers and especially one based on expected maximization (<it>EM_array</it>). These improvements have been observed also on the imputation of extreme values, the most difficult predictable values. We showed that the imputed MVs have still important effects on the stability of the gene clusters. The improvement on the clustering obtained by hierarchical clustering remains limited and, not sufficient to restore completely the correct gene associations. However, a common tendency can be found between the quality of the imputation method and the gene cluster stability. Even if the comparison between clustering algorithms is a complex task, we observed that <it>k-means </it>approach is more efficient to conserve gene associations.</p> <p>Conclusions</p> <p>More than 6.000.000 independent simulations have assessed the quality of 12 imputation methods on five very different biological datasets. Important improvements have so been done since our last study. The <it>EM_array </it>approach constitutes one efficient method for restoring the missing expression gene values, with a lower estimation error level. Nonetheless, the presence of MVs even at a low rate is a major factor of gene cluster instability. Our study highlights the need for a systematic assessment of imputation methods and so of dedicated benchmarks. A noticeable point is the specific influence of some biological dataset.</p

    Analysis of protein chameleon sequence characteristics

    Get PDF
    Conversion of local structural state of a protein from an α-helix to a ÎČ-strand is usually associated with a major change in the tertiary structure. Similar changes were observed during the self assembly of amyloidogenic proteins to form fibrils, which are implicated in severe diseases conditions, e.g., Alzheimer disease. Studies have emphasized that certain protein sequence fragments known as chameleon sequences do not have a strong preference for either helical or the extended conformations. Surprisingly, the information on the local sequence neighborhood can be used to predict their secondary at a high accuracy level. Here we report a large scale-analysis of chameleon sequences to estimate their propensities to be associated with different local structural states such as α -helices, ÎČ-strands and coils. With the help of the propensity information derived from the amino acid composition, we underline their complexity, as more than one quarter of them prefers coil state over to the regular secondary structures. About half of them show preference for both α-helix and ÎČ-sheet conformations and either of these two states is favored by the rest

    Analysis of HSP90-related folds with MED-SuMo classification approach

    Get PDF
    Three-dimensional structural information is critical for understanding functional protein properties and the precise mechanisms of protein functions implicated in physiological and pathological processes. Comparison and detection of protein binding sites are key steps for annotating structures with functional predictions and are extremely valuable steps in a drug design process. In this research area, MED-SuMo is a powerful technology to detect and characterize similar local regions on protein surfaces. Each amino acid residue’s potential chemical interactions are represented by specific surface chemical features (SCFs). The MED-SuMo heuristic is based on the representation of binding sites by a graph structure suitable for exploration by an efficient comparison algorithm. We use this approach to analyze one particular SCOP superfamily which includes HSP90 chaperone, MutL/DNA topoisomerase, histidine kinases, and α-ketoacid dehydrogenase kinase C (BCK). They share a common fold and a common region for ATP-binding. To analyze both similar and differing features of this fold, we use a novel classification method, the MED-SuMo multi approach (MED-SMA). We highlight common and distinct features of these proteins. The different clusters created by MED-SMA yield interesting observations. For instance, one cluster gathers three types of proteins (HSP90, topoisomerase VI, and BCK) which all bind the drug radicicol

    Protein secondary structure assignment revisited: a detailed analysis of different assignment methods

    Get PDF
    BACKGROUND: A number of methods are now available to perform automatic assignment of periodic secondary structures from atomic coordinates, based on different characteristics of the secondary structures. In general these methods exhibit a broad consensus as to the location of most helix and strand core segments in protein structures. However the termini of the segments are often ill-defined and it is difficult to decide unambiguously which residues at the edge of the segments have to be included. In addition, there is a "twilight zone" where secondary structure segments depart significantly from the idealized models of Pauling and Corey. For these segments, one has to decide whether the observed structural variations are merely distorsions or whether they constitute a break in the secondary structure. METHODS: To address these problems, we have developed a method for secondary structure assignment, called KAKSI. Assignments made by KAKSI are compared with assignments given by DSSP, STRIDE, XTLSSTR, PSEA and SECSTR, as well as secondary structures found in PDB files, on 4 datasets (X-ray structures with different resolution range, NMR structures). RESULTS: A detailed comparison of KAKSI assignments with those of STRIDE and PSEA reveals that KAKSI assigns slightly longer helices and strands than STRIDE in case of one-to-one correspondence between the segments. However, KAKSI tends also to favor the assignment of several short helices when STRIDE and PSEA assign longer, kinked, helices. Helices assigned by KAKSI have geometrical characteristics close to those described in the PDB. They are more linear than helices assigned by other methods. The same tendency to split long segments is observed for strands, although less systematically. We present a number of cases of secondary structure assignments that illustrate this behavior. CONCLUSION: Our method provides valuable assignments which favor the regularity of secondary structure segments

    Computational fragment-based drug design to explore the hydrophobic subpocket of the mitotic kinesin Eg5 allosteric binding site

    Get PDF
    International audienceEg5, a mitotic kinesin exclusively involved in the formation and function of the mitotic spindle has attracted interest as an anticancer drug target. Eg5 is co-crystallized with several inhibitors bound to its allosteric binding pocket. Each of these occupies a pocket formed by loop 5/helix alpha2 (L5/alpha2). Recently designed inhibitors additionally occupy a hydrophobic pocket of this site. The goal of the present study was to explore this hydrophobic pocket with our MED-SuMo fragment-based protocol, and thus discover novel chemical structures that might bind as inhibitors. The MED-SuMo software is able to compare and superimpose similar interaction surfaces upon the whole protein data bank (PDB). In a fragment-based protocol, MED-SuMo retrieves MED-Portions that encode protein-fragment binding sites and are derived from cross-mining protein-ligand structures with libraries of small molecules. Furthermore we have excluded intra-family MED-Portions derived from Eg5 ligands that occupy the hydrophobic pocket and predicted new potential ligands by hybridization that would fill simultaneously both pockets. Some of the latter having original scaffolds and substituents in the hydrophobic pocket are identified in libraries of synthetically accessible molecules by the MED-Search software

    ABCG2 Is Overexpressed on Red Blood Cells in Ph-Negative Myeloproliferative Neoplasms and Potentiates Ruxolitinib-Induced Apoptosis

    Get PDF
    Acknowledgments: The authors would like to thank Dominique Gien, Sirandou Tounkara, and Eliane VĂ©ra at Centre National de RĂ©fĂ©rence pour les Groupes Sanguins for the management of blood samples. Funding: The work was supported by Institut National de la SantĂ© et de la Recherche MĂ©dicale (Inserm), Institut National de la Transfusion Sanguine (INTS), the University of Paris, and grants from Laboratory of Excellence (Labex) GR-Ex, reference No. ANR-11-LABX-0051. The Labex GR-Ex is funded by the IdEx program “Investissements d’avenir” of the French National Research Agency, reference No. ANR-18-IDEX-0001. R.B. was funded by the European Union’s Horizon 2020 Research and Innovation Program under grant agreement No. 675115-RELEVANCE-H2020-MSCA-ITN-2015. M.B. was funded by MinistĂšre de l’Enseignement SupĂ©rieur et de la Recherche at the BioSPC Doctoral School. R.B. and M.B. also received financial support from SociĂ©tĂ© Française d’HĂ©matologie (SFH) and Club du Globule Rouge et du Fer (CGRF).Peer reviewedPublisher PD

    Assignment of PolyProline II Conformation and Analysis of Sequence – Structure Relationship

    Get PDF
    International audienceBACKGROUND: Secondary structures are elements of great importance in structural biology, biochemistry and bioinformatics. They are broadly composed of two repetitive structures namely α-helices and ÎČ-sheets, apart from turns, and the rest is associated to coil. These repetitive secondary structures have specific and conserved biophysical and geometric properties. PolyProline II (PPII) helix is yet another interesting repetitive structure which is less frequent and not usually associated with stabilizing interactions. Recent studies have shown that PPII frequency is higher than expected, and they could have an important role in protein - protein interactions. METHODOLOGY/PRINCIPAL FINDINGS: A major factor that limits the study of PPII is that its assignment cannot be carried out with the most commonly used secondary structure assignment methods (SSAMs). The purpose of this work is to propose a PPII assignment methodology that can be defined in the frame of DSSP secondary structure assignment. Considering the ambiguity in PPII assignments by different methods, a consensus assignment strategy was utilized. To define the most consensual rule of PPII assignment, three SSAMs that can assign PPII, were compared and analyzed. The assignment rule was defined to have a maximum coverage of all assignments made by these SSAMs. Not many constraints were added to the assignment and only PPII helices of at least 2 residues length are defined. CONCLUSIONS/SIGNIFICANCE: The simple rules designed in this study for characterizing PPII conformation, lead to the assignment of 5% of all amino as PPII. Sequence - structure relationships associated with PPII, defined by the different SSAMs, underline few striking differences. A specific study of amino acid preferences in their N and C-cap regions was carried out as their solvent accessibility and contact patterns. Thus the assignment of PPII can be coupled with DSSP and thus opens a simple way for further analysis in this field
    • 

    corecore